How Many Languages Can a Language Model Model?

نویسنده

  • Robert Östling
چکیده

One of the purposes of the VarDial workshop series is to encourage research into NLP methods that treat human languages as a continuum, by designing models that exploit the similarities between languages and variants. In my work, I am using a continuous vector representation of languages that allows modeling and exploring the language continuum in a very direct way. The basic tool for this is a character-based recurrent neural network language model conditioned on language vectors whose values are learned during training. By feeding the model Bible translations in a thousand languages, not only does the learned vector space capture language similarity, but by interpolating between the learned vectors it is possible to generate text in unattested intermediate forms between the training languages. Biography Robert Östling is working on ways to use parallel corpora in computational linguistics, including machine translation, cross-language learning and language typology.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

مقایسه روش های طیفی برای شناسایی زبان گفتاری

Identifying spoken language automatically is to identify a language from the speech signal. Language identification systems can be divided into two categories, spectral-based methods and phonetic-based methods. In the former, short-time characteristics of speech spectrum are extracted as a multi-dimensional vector. The statistical model of these features is then obtained for each language. The ...

متن کامل

Developing a Model of Identity for the Iranian EFL Context: with a Focus on Language Proficiency

This study intended to develop a model of identity for the Iranian EFL context with emphasize on their language proficiency. Moreover, the study defined learners' opinions about the language and identity and that method would be the best to be taught. The project had a cross-sectional quantitative research design, collecting both quantitative and qualitative data via interviews, questionn...

متن کامل

Translation of Cultural Terms: A Case Study of a Novel Titled ‘For One More Day’

Translating the cultural terms in an understandable way for the target readers can be challenging for translators. Translators should be familiar with the cultures of both languages (i.e. source and target languages). The present study aimed to show that which cultural terms strategies are more common in translation of the novel titled “For One More Day” based on Aixela‟s model. This study also...

متن کامل

Iranian Language Teachers’ Passion for the Profession: A Qualitative Study

To explore Iranian professionally developed English teachers’ passion for the English language teaching profession, an interview with 7 open-ended questions was conducted to 14 Iranian professionally developed teachers to discover what factors were at work in their professional growth. Participants included 8 Ph.D. holders, 3 Ph.D. candidates, and 3 M.A. holders in TEFL who had more than 20 yea...

متن کامل

The Consequences of the Contacts between Bantu and Non-Bantu Languages around Lake Eyasi in Northern Tanzania

In rural Tanzania, recent major influences happen between Kiswahili and English to ethnic languages rather than ethnic languages, which had been in contact for so long, influencing each other. In this work, I report the results of investigation of lexical changes in indigenous languages that aimed at examining how ethnic communities and their languages, namely Cushitic Iraqw, Nilotic Datooga, N...

متن کامل

Quantifying Investment in Language Learning: Model and Questionnaire Development and Validation in the Iranian Context

The present exploratory study aimed to provide a more tangible and comprehensive picture of the construct of investment in language learning through investigating the issue from a quantitative perspective. To this end, the present researchers followed three main phases. First, a hypothesized model of investment in language learning with six components was developed for the Iranian English as a ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016